Goto

Collaborating Authors

 networked deep multi-agent reinforcement learning


Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning

Neural Information Processing Systems

We consider the networked multi-agent reinforcement learning (MARL) problem in a fully decentralized setting, where agents learn to coordinate to achieve joint success. This problem is widely encountered in many areas including traffic control, distributed control, and smart grids. We assume each agent is located at a node of a communication network and can exchange information only with its neighbors. Using softmax temporal consistency, we derive a primal-dual decentralized optimization method and obtain a principled and data-efficient iterative algorithm named {\em value propagation}. We prove a non-asymptotic convergence rate of $\mathcal{O}(1/T)$ with nonlinear function approximation. To the best of our knowledge, it is the first MARL algorithm with a convergence guarantee in the control, off-policy, non-linear function approximation, fully decentralized setting.


Reviews: Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning

Neural Information Processing Systems

This paper tackles the problem of decentralized learning in multi-agent environments. While many recent approaches use a combination of centralized learning and decentralized execution, the decentralized learning paradigm is motivated by scenarios where a centralized agent (e.g. a value function) may be too expensive to use, or may have undesirable privacy implications. However, previous decentralized learning approaches haven't been very effective for multi-agent problems. The paper proposes a new algorithm, value propagation, and prove that it converges in the non-linear function approximation case. To my knowledge, the value propagation algorithm is novel and interesting.



Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning

Neural Information Processing Systems

We consider the networked multi-agent reinforcement learning (MARL) problem in a fully decentralized setting, where agents learn to coordinate to achieve joint success. This problem is widely encountered in many areas including traffic control, distributed control, and smart grids. We assume each agent is located at a node of a communication network and can exchange information only with its neighbors. Using softmax temporal consistency, we derive a primal-dual decentralized optimization method and obtain a principled and data-efficient iterative algorithm named {\em value propagation}. We prove a non-asymptotic convergence rate of \mathcal{O}(1/T) with nonlinear function approximation.


Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning

Qu, Chao, Mannor, Shie, Xu, Huan, Qi, Yuan, Song, Le, Xiong, Junwu

Neural Information Processing Systems

We consider the networked multi-agent reinforcement learning (MARL) problem in a fully decentralized setting, where agents learn to coordinate to achieve joint success. This problem is widely encountered in many areas including traffic control, distributed control, and smart grids. We assume each agent is located at a node of a communication network and can exchange information only with its neighbors. Using softmax temporal consistency, we derive a primal-dual decentralized optimization method and obtain a principled and data-efficient iterative algorithm named {\em value propagation}. We prove a non-asymptotic convergence rate of $\mathcal{O}(1/T)$ with nonlinear function approximation.